13 research outputs found

    Controlling for confounding network properties in hypothesis testing and anomaly detection

    Get PDF
    An important task in network analysis is the detection of anomalous events in a network time series. These events could merely be times of interest in the network timeline or they could be examples of malicious activity or network malfunction. Hypothesis testing using network statistics to summarize the behavior of the network provides a robust framework for the anomaly detection decision process. Unfortunately, choosing network statistics that are dependent on confounding factors like the total number of nodes or edges can lead to incorrect conclusions (e.g., false positives and false negatives). In this dissertation we describe the challenges that face anomaly detection in dynamic network streams regarding confounding factors. We also provide two solutions to avoiding error due to confounding factors: the first is a randomization testing method that controls for confounding factors, and the second is a set of size-consistent network statistics which avoid confounding due to the most common factors, edge count and node count

    On Large-Scale Graph Generation with Validation of Diverse Triangle Statistics at Edges and Vertices

    Full text link
    Researchers developing implementations of distributed graph analytic algorithms require graph generators that yield graphs sharing the challenging characteristics of real-world graphs (small-world, scale-free, heavy-tailed degree distribution) with efficiently calculable ground-truth solutions to the desired output. Reproducibility for current generators used in benchmarking are somewhat lacking in this respect due to their randomness: the output of a desired graph analytic can only be compared to expected values and not exact ground truth. Nonstochastic Kronecker product graphs meet these design criteria for several graph analytics. Here we show that many flavors of triangle participation can be cheaply calculated while generating a Kronecker product graph. Given two medium-sized scale-free graphs with adjacency matrices AA and BB, their Kronecker product graph has adjacency matrix C=A⊗BC = A \otimes B. Such graphs are highly compressible: ∣E∣|{\cal E}| edges are represented in O(∣E∣1/2){\cal O}(|{\cal E}|^{1/2}) memory and can be built in a distributed setting from small data structures, making them easy to share in compressed form. Many interesting graph calculations have worst-case complexity bounds O(∣E∣p){\cal O}(|{\cal E}|^p) and often these are reduced to O(∣E∣p/2){\cal O}(|{\cal E}|^{p/2}) for Kronecker product graphs, when a Kronecker formula can be derived yielding the sought calculation on CC in terms of related calculations on AA and BB. We focus on deriving formulas for triangle participation at vertices, tC{\bf t}_C, a vector storing the number of triangles that every vertex is involved in, and triangle participation at edges, ΔC\Delta_C, a sparse matrix storing the number of triangles at every edge.Comment: 10 pages, 7 figures, IEEE IPDPS Graph Algorithms Building Block

    An Ensemble Framework for Detecting Community Changes in Dynamic Networks

    Full text link
    Dynamic networks, especially those representing social networks, undergo constant evolution of their community structure over time. Nodes can migrate between different communities, communities can split into multiple new communities, communities can merge together, etc. In order to represent dynamic networks with evolving communities it is essential to use a dynamic model rather than a static one. Here we use a dynamic stochastic block model where the underlying block model is different at different times. In order to represent the structural changes expressed by this dynamic model the network will be split into discrete time segments and a clustering algorithm will assign block memberships for each segment. In this paper we show that using an ensemble of clustering assignments accommodates for the variance in scalable clustering algorithms and produces superior results in terms of pairwise-precision and pairwise-recall. We also demonstrate that the dynamic clustering produced by the ensemble can be visualized as a flowchart which encapsulates the community evolution succinctly.Comment: 6 pages, under submission to HPEC Graph Challeng

    DYMOND: DYnamic MOtif-NoDes Network Generative Model

    Full text link
    Motifs, which have been established as building blocks for network structure, move beyond pair-wise connections to capture longer-range correlations in connections and activity. In spite of this, there are few generative graph models that consider higher-order network structures and even fewer that focus on using motifs in models of dynamic graphs. Most existing generative models for temporal graphs strictly grow the networks via edge addition, and the models are evaluated using static graph structure metrics -- which do not adequately capture the temporal behavior of the network. To address these issues, in this work we propose DYnamic MOtif-NoDes (DYMOND) -- a generative model that considers (i) the dynamic changes in overall graph structure using temporal motif activity and (ii) the roles nodes play in motifs (e.g., one node plays the hub role in a wedge, while the remaining two act as spokes). We compare DYMOND to three dynamic graph generative model baselines on real-world networks and show that DYMOND performs better at generating graph structure and node behavior similar to the observed network. We also propose a new methodology to adapt graph structure metrics to better evaluate the temporal aspect of the network. These metrics take into account the changes in overall graph structure and the individual nodes' behavior over time.Comment: In Proceedings of the Web Conference 2021 (WWW '21

    Randomization tests for distinguishing social influence and homophily effects

    No full text
    Relational autocorrelation is ubiquitous in relational domains. This observed correlation between class labels of linked in-stances in a network (e.g., two friends are more likely to share political beliefs than two randomly selected people) can be due to the effects of two different social processes. If social influence effects are present, instances are likely to change their attributes to conform to their neighbor values. If homophily effects are present, instances are likely to link to other individuals with similar attribute values. Both these effects will result in autocorrelated attribute values. When analyzing static relational networks it is impossible to de-termine how much of the observed correlation is due each of these factors. However, the recent surge of interest in social networks has increased the availability of dynamic network data. In this paper, we present a randomization technique for temporal network data where the attributes and links change over time. Given data from two time steps, we mea-sure the gain in correlation and assess whether a significant portion of this gain is due to influence and/or homophily. We demonstrate the efficacy of our method on semi-synthetic data and then apply the method to a real-world social net-works dataset, showing the impact of both influence and homophily effects

    Fast Generation of Large Scale Social Networks While Incorporating Transitive Closures

    No full text
    A key challenge in the social network community is the problem of network generation—that is, how can we create synthetic networks that match characteristics traditionally found in most real world networks? Important characteristics that are present in social networks include a power law degree distribution, small diameter, and large amounts of clustering. However, most current network generators, such as the Chung Lu and Kronecker models, largely ignore the clustering present in a graph and focus on preserving other network statistics, such as the power law distribution. Models such as the exponential random graph model have a transitivity parameter that can capture clustering, but they are computationally difficult to learn, making scaling to large real world networks intractable. In this work, we propose an extension to the Chung Lu random graph model, the Transitive Chung Lu (TCL) model, which incorporates the notion transitive edges. Specifically, it combines the standard Chung Lu model with edges that are formed through transitive closure (e.g., by connecting a ‘friend of a friend’). We prove TCL’s expected degree distribution is equal to the degree distribution of the original input graph, while still providing the ability to capture the clustering in the network. The single parameter required by our model can be learned in seconds on graphs with millions of edges; networks can be generated in time that is linear in the number of edges. We demonstrate the performance of TCL on four real-world social networks, including an email dataset with hundreds of thousands of nodes and millions of edges, showing TCL generates graphs that match the degree distribution, clustering coefficients and hop plots of the original networks

    The Impact of Communication Structure and Interpersonal Dependencies on Distributed Teams

    No full text
    Abstract—In the past decade, we have witnessed an explosive growth of the Web, online communities, and social media. This has led to a substantial increase in the range and scope of electronic communication and distributed collaboration. In distributed teams, social communication is thought to be critical for creating and sustaining relationships, but there is often limited opportunity for team members to build interpersonal connections through face to face interactions. Although social science research has examined some relational aspects of distributed teams, this work has only recently begun to explore the potentially complex relationship between communication, interpersonal relationship formation, and the effectiveness of distributed teams. In this work, we analyze data from an experimental study comparing distributed and co-located teams of undergraduates working to solve logic problems. We use a combined set of tools, including statistical analysis, social network analysis, and machine learning, to analyze the influence of interpersonal communication on the effectiveness of distributed and co-located teams. Our results indicate there are significant differences in participants ’ self- and group perceptions with respect to: (i) distributed vs. co-located settings, and (ii) communication structures within the team. I
    corecore